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Abstract — Staircase codes, a new class of forward-error- 
correction (FEC) codes suitable for high-speed optical commu- 
nications, are introduced. An ITU-T G.709-compatible staircase 
code with rate R = 239/255 is proposed, and FPGA-based 
simulation results are presented, exhibiting a net coding gain 
(NCG) of 9.41 dB at an output error rate of 10~ 15 , an 
improvement of 0.42 dB relative to the best code from the ITU- 
T G.975.1 recommendation. An error floor analysis technique is 
presented, and the proposed code is shown to have an error floor 
at 4.0 x 10~ 21 . 

Index Terms — Staircase codes, fiber-optic communications, for- 
ward error correction, product codes, low-density parity-check 
codes. 



I. Introduction 

ADVANCES in physics — the invention of the laser, low- 
loss optical fiber, and the optical amplifier — have driven 
the exponential growth in worldwide data communications. 
However, as these technologies mature, system designers have 
increasingly focused on techniques from communication the- 
ory, including forward error correction, to simultaneously in- 
crease transmission capacity and decrease transmission costs. 

One of the first proposals for FEC in an optical system 
appeared in |fl~), which demonstrated a shortened (224, 216) 
Hamming code implementation at 565 Mbit/s. Since then, 
ITU-T Recommendations G.975 and G.975.1 have standard- 
ized more powerful codes for optical transport networks 
(OTNs). More recently, low-density parity-check (LDPC) 
codes 12, — which provide the potential for capacity- 
approaching performance — have been investigated, as aptly 
summarized in 0, 0. While implementations exists at 10 
Gb/s (for lOGBase-T ethernet networks), the blocklengths of 
such implementations (~ 500-2000) are too short to provide 
performance close to capacity; the (2048, 1723) RS-LDPC 
code is approximately 3 dB from the Shannon Limit at 
10 -15 0, see also JTJ. Another significant roadblock is that 
fiber-optic communication systems are typically required to 
provide bit-error-rates below 10~ 15 . It is well-known that 
capacity-approaching LDPC codes exhibit error floors 0, 
and to achieve the targeted error rate would likely require 
concatenation with an outer code (e.g., as in 0). In this work, 
we focus on product-like codes (by product-like codes, we 
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mean any generalized LDPC code with algebraic component 
codes), since they possess properties that make them particu- 
larly suited to providing error-correction in fiber-optic commu- 
nication systems. In particular, for 100 Gb/s implementations, 
we argue that syndrome -based decoding of product-like codes 
is significantly more efficient than message-passing decoding 
of LDPC codes. 

This paper presents a new class of high-rate binary error- 
correcting codes — staircase codes — whose construction com- 
bines ideas from convolutional and block coding. Indeed, stair- 
case codes can be interpreted as having a 'continuous' product- 
like construction. In the context of wireless communications, 
related code constructions include braided block codes iflOl . 
braided convolutional codes ifTTl . diamond codes Ill2l and 
cross parity check convolutional codes lfl3l . each of which 
is related to the recurrent codes of Wyner-Ash |14|. However, 
these proposals considered soft decoding of the component 
codes, which is unsuitable for high-speed fiber-optic commu- 
nications. Herein, we describe a syndrome-based decoder for 
staircase codes, that provides excellent performance with an 
efficient decoder implementation. 

In Section [II] we review the specifications and performance 
of FEC codes defined in ITU-T Recommendations G.975 
and G.975.1. In Section [Till we describe the syndrome-based 
decoder for product-like codes, and argue that it results in a 
decoder data-flow that is more than two orders of magnitude 
smaller than the message-passing decoder of an LDPC code. 
Staircase codes are presented in Section [IV] and a G. 709- 
compatible staircase code is proposed. In Section [VJ we 
present an analytical method for determining the error floor 
of iteratively decoded staircase codes, and show that the 
proposed staircase code has an error floor at 4.0 x 10 -21 . 
Finally, in Section IVI1 we present FPGA-based simulation 
results, illustrating that the proposed code provides a 9.41 dB 
NCG at an output error rate of 10" 15 , an improvement of 
0.42 dB relative to the best code from the ITU-T G.975.1 
recommendation, and only 0.56 dB from the Shannon Limit. 

II. Existing Proposals 

A. ITU-T Recommendation G.975 

The first error-correction code standardized for optical com- 
munications was the (255, 239) Reed-Solomon code, with 
symbols in F 2 s, capable of correcting up to 8 symbol errors 
in any codeword. For an output-error-rate of 10~ 15 , the NCG 
of the RS code is 6.2 dB, which is 3.77 dB from capacity. 

In order to provide improved burst-error-correction, 16 
codewords are block-interleaved, providing correction for 
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bursts of as many as 1024 transmitted bits. A framing row 
consists of 16 ■ 255 ■ 8 bits, 30592 of which are information 
bits, and the remaining 2048 bits of which are parity. The 
resulting framing structure — a frame consists of four rows — 
is standardized in ITU-T recommendation G.709, and remains 
the required framing structure for OTNs; as a direct result, the 
coding rate of any candidate code must be R = 239/255. 

B. ITU-T Recommendation G. 975.1 

As per-channel data rates increased to 10 Gb/s, and the 
capabilities of high-speed electronics improved, the (255, 239) 
RS code was replaced with stronger error-correcting codes. 
In ITU-T recommendation G. 975.1, several 'next-generation' 
coding schemes were proposed; among the many proposals, 
the common mechanism for increased coding gain was the use 
of concatenated coding schemes with iterative hard-decision 
decoding. We now describe four of the best proposals, which 
will motivate our approach in Section [TV] 

In Appendix 1.3 of G.975.1, a serially concatenated coding 
scheme is described, with outer (3860, 3824) binary BCH 
code and inner (2040, 1930) binary BCH code, which are 
obtained by shortening their respective mother codes. First, 
30592 = 8 ■ 3824 information bits are divided into 8 units, 
each of which is encoded by the outer code; we will refer 
to the resulting unit of 30880 bits as a 'block'. Prior to 
encoding by the inner code, the contents of consecutive blocks 
are interleaved (in a 'continuous' fashion, similar to convolu- 
tional interleavers |[T5l). Specifically, each inner codeword in 
a given block involves 'information' bits from each of the 
eight preceding 'outer' blocks. Note that the interleaving step 
increases the effective block-length of the overall code, but it 
necessitates a sliding-window style decoding algorithm, due to 
the continuous nature of the interleaver. Furthermore, unlike 
a product code, the parity bits of the inner code are protected 
by a single component codeword, which reduces their level of 
protection. For an output-error-rate of 10 -15 , the NCG of the 
1.3 code is 8.99 dB, which is 0.98 dB from capacity. 

In Appendix 1.4 of G.975.1, a serially concatenated scheme 
with (shortened versions of) an outer (1023, 1007) RS code 
and (shortened versions of) an inner (2047, 1952) binary BCH 
code is proposed. After encoding 122368 bits with the outer 
code, the coded bits are block interleaved and encoded by 
the inner BCH code, resulting in a block length of 130560 
bits, i.e., exactly one G.709 frame. As in the previous case, 
the parity bits of the inner code are singly-protected. For an 
output-error-rate of 10~ 15 , the NCG of the 1.4 code is 8.67 
dB, which is 1.3 dB from capacity. 

In Appendix 1.5 of G.975.1, a serially concatenated scheme 
with an outer (1901, 1855) RS code and an inner (512, 502) x 
(510, 500) extended-Hamming product code is described. It- 
erative decoding is applied to the inner product code, after 
which the outer code is decoded; the purpose of the outer 
code is to eliminate the error floor of the inner code, since 
the inner code has small stall patterns (see Section IVTi. For an 
output-error-rate of 10~ 15 , the NCG of the 1.5 code is 8.5 dB, 
which is 1.47 dB from capacity. 

Finally, in Appendix 1.9 of G.975.1, a product-like code 
with (1020, 988) doubly-extended binary BCH component 



codes is proposed. The overall code is described in terms 
of a 512 x 1020 matrix of bits, in which the bits along 
both the rows of the matrix as well as a particular choice 
of 'diagonals' must form valid codewords in the component 
code. Since the diagonals are chosen to include 2 bits in 
every row, any diagonal codeword has two bits in common 
with any row codeword; in contrast, for a product code, any 
row and column have exactly one bit in common. Note that 
the 1.9 construction achieves a product-like construction (their 
choice of diagonals ensures that each bit is protected by two 
component codewords) with essentially half the overall block 
length of the related product code (even so, the 1.9 code has the 
longest block length among all G.975.1 proposals). However, 
the choice of diagonals decreases the size of the smallest stall 
patterns, introducing an error floor above 10 -14 . For an output- 
error-rate of 2 • 10" 14 , the NCG of the 1.9 code is 8.67 dB, 
which is 1.3 dB from capacity. 

III. LDPC vs. Product Codes 

In this section, we present a high-level view of iterative 
decoders for LDPC and product codes. Due to the differences 
in their implementations, a precise comparison of their im- 
plementation complexities is difficult. Nevertheless, since the 
communication complexity of message-passing is a significant 
challenge in LDPC decoder design, we consider the decoder 
data-flow, i.e., the rate of routing/storing messages, as a 
surrogate for the implementation complexity. 

A. Decoder-Data-flow Comparison 

We consider a system that transmits information at D bits/s, 
using a binary error-correcting code of rate R — for which 
hard decisions at D/R bits/s are input to the decoder — and a 
decoder that operates at a clock frequency f c Hz. 

1) LDPC Code: We consider an LDPC decoder that im- 
plements sum-product decoding (or some quantized approx- 
imation) with a parallel-flooding schedule. We assume q-bit 
messages internal to the decoder, an average variable node 
degree d av , and N decoder iterations; typically, q is 4 or 5 
bits, d av ~ 3, and N ~ 15 — 25. Initially, hard-decisions are 
input to the decoder at a rate of D/R bits/s and stored in 
flip-flop registers. At each iteration, variable nodes compute 
and broadcast q-bit messages over every edge, and similarly 
for the check nodes, i.e., 2qd av bits are broadcast per iteration 
per variable node. Since bits arrive from the channel at D/R 
bits/s, the corresponding internal data-flow per iteration is then 
D2qd av /R, and the total data-flow, including initial loading 
of 1-bit channel messages, is 

D 2NDqd av 

-TLDPC — "FT H „ 



For N = 20, q = 4, d av = 3, F LDPC w 480D/i?, which 
corresponds to a data-flow of more than 48 Tb/s for 100 Gb/s 
systems. 
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Fig. 1. Data-flow in an LDPC decoder 



2) Product Code: When the component codes of a product 
code can be efficiently decoded via syndromes (e.g., BCH 
codes), there exists an especially efficient decoder for the prod- 
uct code. Briefly, by operating exclusively in the 'syndrome 
domain' — which compresses the received signal — and passing 
only < t messages per (component) decoding (for f-error- 
correcting component codes), the implementation complexity 
of decoding is significantly reduced. 

The following is a step-by-step description of the decoding 
algorithm: 

1 ) From the received data, compute and store the syndrome 
for each row and column codeword. Store a copy of the 
received data in memory R. 

2) Decode those non-zero syndromes corresponding to row 
codeword^. In the event of a successful decoding, set 
the syndrome to zero, flip the corresponding t or fewer 
positions in memory R, and update the t or fewer 
affected column syndromes by a masking operation. 

3) Repeat Step 2, reversing the roles of rows and columns. 

4) If any syndromes are non-zero, and fewer than the 
maximum number of iterations have been performed, 
go to Step 2. Otherwise, output the contents of memory 
R, 

We quantify the complexity of decoding a product code by 
its decoder data-flow. At first glance, it may seem that this 
approach ignores the complexity of decoding the (component) 
i-error-correcting BCH codewords. However, for relatively 
small t, the decoding of a component codeword can be ef- 
ficiently decomposed into a series of look-up table operations, 
for which the data-flow interpretation is well-justified. In 
this section, we will ignore the data-flow contribution of the 
BCH decoding algorithm, but we return to this point in the 
Appendix, where it is shown that the corresponding data-flow 
is negligible. 

We assume that rows are encoded by a t\ -error-correcting 
(ni, hi = ni — n) BCH code, and the columns are encoded 
by a t-2 -error-correcting (n 2 ,k 2 = n 2 — ''2) BCH code, for 
an overall rate R = R\R 2 . We assume each row/column 
codeword is decoded (on average, over the course of decoding 
the overall product code) v times, where typically v ranges 
from 3 to 4. 

The hard-decisions from the channel — at D/R bits/s — are 
written to a data RAM, in addition to being processed by a 
syndrome computation/storage device. Contrary to the LDPC 
decoder data-flow, the clock frequency f c plays a central role, 
namely in the data-flow of the initial syndrome calculation. 

1 In practice, the syndrome corresponding to a fixed row is decoded only if 
its value has changed since its last decoding. 
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Fig. 2. Data-flow in the initial syndrome computing 

Referring to Fig. f2] and assuming that the bits in a product 
code are transmitted row-by-row, the input bus-width (i.e., the 
number of input bits per decoder clock cycle) is D/(Rf c ) 
bits. Now, assuming these bits correspond to a single row 
of the product code, each non-zero bit corresponds to some 
ri-bit mask (i.e., the corresponding column of the parity- 
check matrix of the row code), the modulo-2 sum of these is 
performed by a masking tree, and the ri-bit output is masked 
with the current contents of the corresponding (syndrome) flip- 
flop register. That is, each clock cycle causes a ri-bit mask 
to be added to the contents of the corresponding row in the 
syndrome bank. Of course, each received bit also impacts a 
distinct column syndrome, however, the same r2-bit mask is 
applied (when the corresponding received bit is non-zero) to 
each of the involved column syndromes; the corresponding 
data-flow is then r 2 bits per clock cycle. 

Once the syndromes are computed from the received data, 
iterative decoding commences. To perform a row decoding, an 
ri-bit syndrome is read from the syndrome bank. Since there 
are n 2 row codewords, and each row is decoded on average v 
times, the corresponding data-flow from the syndrome bank to 
the row decoder is r\n 2 vD / '(Rnin 2 ) = T\vD/{Rn\) bits/s. 
For each row decoding, at most t\ positions are corrected, each 
of which is specified by |~log 2 n{\ + [log 2 n 2 ~\ bits. Therefore, 
the data-flow from the row decoder to the data RAM is 

t 1 n 2 vD(\\og 2 n 1 \ + [log 2 n 2 D _ hv D( [log 2 n{\ + riog 2 n 2 ]) 
Rn\n 2 Rri\ 

bits/s. Furthermore, for each corrected bit, a r2-bit mask must 
be applied to the corresponding column syndrome, which 
yields a data-flow from the row decoder to the syndrome bank 
of t\n 2 r 2 vD / (Rnin 2 ) = t\r 2 vDj(Rn\) bits/s. A similar 
analysis can be applied to column decodings. In total, the 
decoder data-flow is 

F P = ^ + ( ri +r 2 )-f c 
Dv 

(ii [log 2 n{\ + h |~log 2 n 2 ~\ + ri + tir 2 ) 



Rni 
Dv 
Rn 2 



(t 2 \\og 2 ni] + *2 r io S2 I2I +r 2 + t 2 r x ) 



In this work, we will focus on codes for which n\ = n 2 ps 
1000, ri = r 2 = 32, t\=t 2 = 3, and the decoder is assumed 
to operate at f c ps 400 MHz. For v = 4, we then have a data- 
flow of approximately 293 Gb/s. Note that this is more than 
two orders of magnitude smaller than the corresponding data- 
flow for LDPC decoding. Intuitively, the advantage arises from 
two facts. First, when i?i > 1/2 and R 2 > 1/2, syndromes 
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Fig. 3. Data-flow in a product-code decoder 
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Fig. 4. The 'staircase' visualization of staircase codes. 



provide a compressed representation of the received signal. 
Second, the algebraic component codes admit an economical 
message-passing scheme, in the sense that message updates are 
only required for the small fraction of bits that are corrected 
by a particular (component code) decoding. 

IV. Staircase Codes 

The staircase code construction combines ideas from recur- 
sive convolutional coding and block coding. Staircase codes 
are completely characterized by the relationship between 
successive matrices of symbols. Specifically, consider the 
(infinite) sequence Bq, B\, Bi, . ■ ■ of m-by-m matrices Bi, 
i € Z + . Herein, we restrict our attention to B t with elements 
in F2, but an analogous construction applies in the non-binary 
case. 

Block Bq is initialized to a reference state known to the 
encoder-decoder pair, e.g., block Bq could be initialized to 
the all-zeros state, i.e., an m-by-m array of zero symbols. 
Furthermore, we select a conventional FEC code (e.g., Ham- 
ming, BCH, Reed-Solomon, etc.) in systematic form to serve 
as the component code; this code, which we henceforth refer 
to as C, is selected to have block length 2m symbols, r of 
which are parity symbols. 

Encoding proceeds recursively on the Bi. For each i, 
m(m — r) information symbols (from the streaming source) 
are arranged into the m — r leftmost columns of Bi ; we denote 
this sub-matrix by -B^l- Then, the entries of the rightmost r 
columns (this sub-matrix is denoted by Bi n) are specified as 
follows: 

1) Form the mx (2m— r) matrix, A = [Bf_ 1 -E^.l], where 
Bf_ 1 is the matrix-transpose of -Rj-i. 

2) The entries of Bi n are then computed such that each 
of the rows of the matrix [Bf_ 1 Bi l Bi n] is a valid 
codeword of C. That is, the elements in the jth row of 
Bi n are exactly the r parity symbols that result from 
encoding the 2m — r 'information' symbols in the jth 
row of A. 

Generally, the relationship between successive blocks in a 
staircase code satisfies the following relation: for any i > 1, 
each of the rows of the matrix [Bf_ 1 Bi\ is a valid codeword in 
C. An equivalent description — from which the term 'staircase 
codes' originates — is suggested by Fig. |U in which (the 
concatenation of the symbols in) every row (and every column) 
in the 'staircase' is a valid codeword of C; this representation 



suggests their connection to product codes. However, staircase 

codes are naturally unterminated (i.e., their block length is 

indeterminate), and thus admit a range of decoding strategies 

with varying latencies. Most importantly, we will see that they 

outperform product codes. 

The rate of a staircase code is 

„ r 
R s = l--, 
m 

since encoding produces r parity symbols for each set of m — 
r 'new' information symbols. However, note that the related 
product code has rate 

2 



R p = 



2m — r 
2m 



1 



m 4m 2 ' 

which is greater than the rate of the staircase code. However, 
for sufficiently high rates, the difference is small, and staircase 
codes outperform product codes of the same rate. 

From the context of transmitter latency — which includes 
encoding latency and frame-mapping latency — staircase codes 
have the advantage (relative to product codes) that the effective 
rate (i.e., the ratio of 'new' information symbols, m — r, to the 
total number of 'new' symbols, m) of a component codeword 
is exactly the rate of the overall code. Therefore, the encoder 
produces parity at a 'regular' rate, which enables the design 
of a frame-mapper that minimizes the transmitter latency. 

We note that staircase codes can be interpreted as general- 
ized LDPC codes with a systematic encoder and an indeter- 
minate block-length, which admits decoding algorithms with 
a range of latencies. 

Using arguments analogous to those used for product codes, 
a t-error-correcting component code C with minimum distance 
d m i n has a Hamming distance between any two staircase 
codewords that is at least eft 



I 2 ■ 

mm' 



A. Decoding Algorithm 

Staircase codes are naturally unterminated (i.e., their block 
length is indeterminate), and thus admit a range of decoding 
strategies with varying latencies. That is, decoding can be ac- 
complished in a sliding-window fashion, in which the decoder 
operates on the received bits corresponding to L consecutively 
received blocks Bi, -Bj+i, • • • j -Bi+L-i- F° r a fixed i, the 
decoder iteratively decodes as follows: First, those component 
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Fig. 5. A multi-edge-type graphical representation of staircase codes. LT is 
a standard block interleaver, i.e., it represents the transpose operation on an 
m-by-m matrix. 



codewords that 'terminate' in block Bi + L-i (i.e., whose 
parity bits are in i) are decoded; since every symbol 

is involved in two component codewords, the corresponding 
syndrome updates are performed, as in Section IIII-A2I Next, 
those codewords that terminate in block Bi+L-2 are decoded. 
This process continues until those codewords that terminate in 
block Bi are decoded. Now, since decoding those codewords 
terminating in some block Bj affects those codewords that 
terminate in block £>j+i, it is beneficial to return to _B; + l_i 
and to repeat the process. This iterative process continues until 
some maximum number of iterations is performed, at which 
time the decoder outputs its estimate for the contents of Bi, 
accepts in a new block Bi + l, and the entire process repeats 
(i.e., the decoding window slides one block to the 'right'). 



B. Multi-edge-type Interpretation 

Staircase codes have a simple graphical representation, 
which provides a multi-edge-type [3] interpretation of their 
construction. The term 'multi-edge-type' was originally ap- 
plied to describe a refined class of irregular LDPC codes, in 
which variable nodes (and check nodes) are classified by their 
degrees with respect to a set of edge types. Intuitively, the 
introduction of multiple edge types allows degree-one variable 
nodes, punctured variable nodes, and other beneficial features 
that are not admitted by the conventional irregular ensemble. 
In turn, better performance for finite blocklengths and fixed 
decoding complexities is possible. 

In Fig. [5] we present the factor graph representation of a 
decoder that operates on a window of L = 4 blocks; the graph 
for general L follows in an obvious way. Dotted variable nodes 
indicate symbols whose value was decoded in the previous 
stage of decoding. The key observation is that when these 
symbols are correctly decoded — which is essentially always 
the case, since the output BER is required to be less than 
10 -15 — the component codewords in which they are involved 
are effectively shortened by m symbols. Therefore, the most 
reliable messages are passed over those edges connecting 
variable nodes to the shortened (component) codewords, as 
indicated in Fig. [5] On the other hand, the rightmost collection 
of variable nodes are (with respect to the current decoding 
window) only involved in a single component codeword, and 
thus the edges to which they are connected carry the least 
reliable messages. Due to the nature of iterative decoding, 
the intermediate edges carry messages whose reliability lies 
between these two extremes. 



C. A G. 7 09 -compatible Staircase Code 

The ITU-T Recommendation G.709 defines the framing 
structure and error-correcting coding rate for OTNs. For our 
purposes, it suffices to know that an optical frame consists 
of 130560 bits, 122368 of which are information bits, and 
the remaining 8192 are parity bits, which corresponds to 
error-correcting codes of rate R = 239/255. Since (510 — 
32)/510 = 239/255, we will consider a component code 
with m = 510 and r = 32. Specifically, the binary (n = 
1023, k = 993, t = 3) BCH code with generator polynomial 



(x 10 + a; 3 - 



■ 1) (x 10 + x 3 + x 2 + x + l)(x w + x 8 - 



-X 3 +2- 2 + l) 



is adapted to provide an additional 2-bit error-detecting mech- 
anism, resulting in the generator polynomial 

g(x) = (x w + x 3 + l){x w + x 3 + x 2 + x + 1) 
• {x w + x 8 + x 3 + x 2 + l){x 2 + 1). 

In order to provide a simple mapping to the G.709 frame, we 
first note that 2 • 130560 = 510 • 512. This leads us to define a 
slight generalization of staircase codes, in which the blocks Bi 
consist of 512 rows of 510 bits. The encoding rule is modified 
as follows: 



Bi'-, B 



i.L 



1) Form the 512 x (512 + 510) matrix, A 

where Bj_ x is obtained by appending two all-zero rows 
to the top of the matrix-transpose of 

2) The entries of Bi n are then computed such that each 
of the rows of the matrix \ L Bj_ l Bi l B^r] is a valid 
codeword of C. That is, the elements in the jth row of 
Bi n are exactly the 32 parity symbols that result from 
encoding the 990 'information' symbols in the jth row 
of A. 

Here, C is the code obtained by shortening the code 
generated by g(x) by one bit, since our overall codeword 
length is 510 + 512 = 1022. 

V. Error Floor Analysis 

For iteratively decoded codes, an error floor (in the output 
bit-error-rate) can often be attributed to error patterns that 
'confuse' the decoder, even though such error patterns could 
easily be corrected by a maximum-likelihood decoder. In the 
context of LDPC codes, these error patterns are often referred 
to as trapping sets J8J. In the case of product-like codes with 
an iterative hard-decision decoding algorithm, we will refer to 
them as stall patterns, due to the fact that the decoder gets 
locked in a state in which no updates are performed, i.e., the 
decoder stalls, as in Fig. [6] 

Definition 1: A stall pattern is a set s of codeword posi- 
tions, for which every row and column involving positions in 
s has at least t + 1 positions in s. 

We note that this definition includes stall patterns that are 
correctable, since an incorrect decoding may fortuitously 
cause one or more bits in s to be corrected, which could 
then lead to all bits in s eventually being corrected. In this 
section, we obtain an estimate for the error floor by over- 
bounding the probabilities of these events, and pessimistically 

2 This is the code applied to the rows (but not the slopes) of the 1.9 code 
in G.975.1. 
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Fig. 6. A stall pattern for a staircase code with a triple-error correcting 
component code. Since every involved component codeword has 4 errors, 
decoding stalls. 



assuming that every stall pattern is uncorrectable (i.e., if any 
stall pattern appears during the course of decoding, it will 
appear in the final output). The methods presented for the 
error floor analysis apply to a general staircase code, but for 
simplicity of the presentation, we will focus on a staircase code 
with m = 510 and doubly-extended triple-error-correcting 
component codes. 

A. A Union Bound Technique 

Due to the streaming nature of staircase codes, it is neces- 
sary to account for stall patterns that span (possibly multiple) 
consecutive blocks. In order to determine the bit-error-rate due 
to stall patterns, we consider a fixed block Bi, and the set 
of stall patterns that include positions in Specifically, we 
'assign' to B{ those stall patterns that include symbols in Bi 
(and possibly additional positions in Bi+i) but no symbols in 
Bi_\. Let Si represent the set of stall patterns assigned to B^. 
By the union bound, we then have 



BERfl 



< 



^Pr[b 



its in s in error 



seSi 



510 2 



Therefore, bounding the error floor amounts to enumerating 
the set Si, and evaluating the probabilities of its elements being 



B. Bounding the Contribution Due to Minimal Stalls 

Definition 2: A minimal stall pattern has the property that 
there are only t + 1 rows with positions in s, and only t + 1 
columns with positions in s. 

The minimal stall patterns of a staircase code can be counted 
in a straightforward manner; the multiplicity of minimal stall 
patterns that are assigned to Bi is 



M„ 



(T 



A /510 



510 



and we refer to the set of minimal stall patterns by <S m i n - The 
probability that the positions in some minimal stall pattern s 
are received in error is p 16 . 

Next, we consider the case in which not all positions in 
some minimal stall pattern s are received in error, but that due 
to incorrect decoding(s), all positions in s are — at some point 
during decoding — simultaneously in error. For some fixed s 
and I, 1 < I < 16, there are ( 1 / 6 ) ways in which 16 — / 
positions in s can be received in error. For the moment, let's 
assume that erroneous bit flips occur independently with some 



probability £, and that £ does not depend on I. Then we can 
overbound the probability that a particular minimal stall s 
occurs by 

E( 1 f)p 16 - , c , = (P+o M - 



In order to provide evidence in favor of these assumptions, 
Table U presents empirical estimates, for I = 0, I = 1 and 
I = 2, of the probability that a minimal stall pattern s occurs 
during iterative decoding, given that 16 — I positions in s are 
(intentionally) received in error. Note that even if a minimal 
stall is received, there exists a non-zero probability that it 
will be corrected as a result of erroneous decodings; we will 
ignore this effect in our estimation, i.e., we make the worst- 
case assumption that any minimal stall persists. Furthermore, 
from the results for I = 1 and I = 2, it appears that our 
stated assumptions regarding £ hold true, and £ « 5.8 x 10 -4 . 
For I > 2, we did not have access to sufficient computational 
resources for estimating the corresponding probabilities. Nev- 
ertheless, based on the evidence presented in Table Q] the error 
floor contribution due to minimal stall patterns is estimated as 



^•Af min .(p + C) 



16 



where ( = 5.8x 10~ 4 when p = 4.8 x 10~ 3 . 

TABLE I 

Estimated probability of a minimal stall s, given that 16 - I 

POSITIONS ARE RECEIVED IN ERROR 



£ Estimated probability 



149/150 
1/1725 
(1/1772) 2 



C. Bounding the Contribution Due to Non-minimal Stalls 

We now wish to account for the error floor contribution of 
non-minimal stalls, e.g., the stall pattern illustrated in Fig. [7] In 
the general case, a stall pattern s includes codeword positions 
in K rows and L columns, K > 4, L > 4; we refer to 
these as (K, i)-stalls. Furthermore, each (K, L)-stall includes 
I positions, 4-max(A", L) < I < K L, where the lower bound 
follows from the fact that every row and column (in the stall) 
includes at least 4 positions. Note that there are 



iK,L 



510 



E 



510 

m 



510 

K — m 



ways to select the involved rows and columns. 

For a fixed (K, L) ^ (4, 4) and a fixed choice of rows and 
columns, we now proceed to overbound the contributions of 
candidate stall patterns. Without loss of generality, we assume 
that K > L, and note that there are (^) ways of choosing 
I = 4K elements (in the L ■ K 'grid' induced by the choice 
of rows and columns) such that each column includes exactly 
four elements, and that every stall pattern 'contains' at least 
one of these. Now, since a stall pattern includes / elements, 
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Fig. 7. A non-minimal stall pattern for a staircase code with a triple-error 
correcting component code. 



4 • K < I < K ■ L, the number of stall patterns with I elements 
is overbounded as 



A" 



K ■ L - 4 • K 
1-4- K 



For a general (K, L) ^ (4, 4), it follows that the number of 
stall patterns with I elements, 4 • max(K, L) < I < K ■ L, is 
overbounded as 



min(iv", L) 
4 



<l(K,L) 



'K ■ L — 4 ■ max(K, L) N 
I — 4 • max(K, L) 

Finally, over the choice of the K rows and L columns, there 
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Fig. 8. Performance of a R = 239/255 staircase code on a binary symmetric 
channel with crossover probability BERi n , compared with various G. 975.1 
codes. The upper scale plots the equivalent binary-input Gaussian channel Q 
(in dB), where BER in = (l/2)erfc(Q/ v / 2). 



Ml L = A K , L - 



min(K , L) 
4 



ma X (A-,L) / ft £ _ 4 . m &x(K, L] 

\ I- A- m&x(K, L) 

(K, L)-stalls with I elements. 

For a fixed K and L, the contribution to the error floor can 
be estimated as 



KL 

E 



—^7 • Ml j 
510 2 K ' L 



(p + l 



t4-max(K,t) 

and in Table [TTJ we provide values for various K and L, when 
( = 5.8x 1(T 4 and p = 4.8 x 1CT 3 . 

Note that the dominant contribution to the error floor is due 
to minimal stall patterns (i.e., K = L = 4), and that the overall 
estimate for the error floor of the code is 3.8 x 10 -21 . Finally, 
we note that by a similar (but more cumbersome) analysis, the 
error floor of the G.709-compliant staircase code is estimated 
to occur at 4.0 x 1CT 21 . 



bit-error-rate curves for the G.975 RS code, as well as the 
G. 975.1 codes described in Section||l] For an output error rate 
10 -15 , the staircase code provides approximately 9.41 dB net 
coding gain, which is within 0.56 dB of the Shannon limit, 
and an improvement of 0.42 dB relative to the best G.975. 1 
code. 



VII. Conclusions 

We proposed staircase codes, a class of product-like FEC 
codes that provide reliable communication for streaming 
sources. Their construction admits low-latency encoding and 
variable-latency decoding, and a decoding algorithm with 
an efficient hardware implementation. For R = 239/255, a 
G. 709-compatible staircase code was presented, and perfor- 
mance within 0.56 dB of the Shannon Limit at 10~ 15 was 
provided via an FPGA-based simulation. 



VI. Simulation Results 

In Fig. [8] simulation results — generated in hardware on 
an FPGA implementation — are provided for the G. 709- 
compatible staircase code, for L = 7. We also present the 



TABLE II 

Contribution to Error Floor Estimate of (K, L)-stall patterns 



K L Contribution 



4 


4 


3.55 x 10" 


•ii 


4 


5 


7.81 x 1CT 


28 


5 


5 


2.54 x irr 


22 


5 


6 


2.21 x 10" 


2S 


6 


6 


1.40 x 10" 


23 


6 


7 


1.49 x 10" 


2!) 


7 


7 


8.53 x 10" 


2.', 


7 


8 


1.83 x 10" 


:S2 



Appendix 

This section briefly describes known techniques for effi- 
ciently decoding triple-error-correcting binary BCH codes, and 
discusses the data-flow associated with a lookup-table-based 
decoder architecture. 

For a syndrome S = (51,5*3,55), 5; G F 2 >«, we first 
compute D 3 — Sf + S3 and D 5 = 5f + 5 5 . A triple-error 
correcting decoder distinguishes the cases 

v = : Si = 5 2 = 5 3 = 

v = 1 : 5i ^ 0, As = A, = 

v = 2: Si y^0, D 3 ^0,5iA = 5 3 D 3 

where v is the number of positions to invert in order to obtain 
a valid codeword. 
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In order to determine the corresponding positions, a recip- 
rocal error-locator polynomial a(x) is denned, the roots of 
which identify the positions. From |fl6l , we have: 

v = 1 : <t(x) = x + Si 

v = 2 : <t(x) — x 2 + Six + D3/S1 

v = 3: a{x) = x 3 + Six 2 + bx + Sib+ D 3 

where 

b = (S 2 S 3 + S 5 )/D 3 . 

When t = 2, note that all of the coefficients of &(x) are 
nonzero. 

It remains to determine the roots of a(x). For v = 1, 
it is trivial to determine the error location. For v = 2 or 
v = 3, lookup-based methods for solving the corresponding 
quadratic and cubic equations are described in ifTTll . ifTHl . In the 
remainder of this section, we briefly describe these methods, 
and discuss their data-flow. 

For a quadratic equation fx (x) = x 2 + ax + b with a^O, 
substitute x = ay to obtain 

f Y (y)=a 2 (y 2 + y + b/a 2 ). 

If /y(r) = then fx(ar) = 0. Thus the problem of finding 
roots of fx (x) reduces to the problem of finding roots of the 
suppressed quadratic fy(y), which can be solved by lookup 
using a table with 2 m entries, each of which is a pair of 
elements in F2™. Therefore, when v = 2, decoding requires 
2rn bits to be read from a lookup-table memory. 

Similarly, for a cubic equation fx (x) — x 3 + ax 2 + bx + c, 
substitute x = y + a to obtain 

My) =V 3 + (a 2 +b)y + ab + c. 

Note that yfy(y) is a linearized polynomial with respect 
to F2 and hence the set of zeros of yfy{y) is a vector space 
over F2. In particular, the roots of yfy(y)> if distinct, are of 
the form {0, r\, r%, r\ + r2}. Thus only r\ and need to be 
stored in the lookup table. 

Two cases arise, depending on the value of a 2 +b = D5/D3. 
If D 5 = 0, so that a 2 +b = 0, then fy(y) = y 3 +ab+c, and the 
roots can be found by finding the cube roots of ab + c = D3, 
which requires lookup using a table with 2™ entries, each 
of which is a pair of elements in ¥2™. If D5 ^ 0, so that 
a 2 + b ^ 0, substitute y = (a 2 + b) x / 2 z to obtain 

f z (z) = (a 2 + bf/ 2 {z 3 + z + {ab + c)/(a 2 + b) 3 ' 2 ), 

where 

ab + c (Dl\ 1/2 
(a 2 + 6) 3 / 2 ~ (jJ 3 ) ' 



The roots of the suppressed cubic fz{z) can be found by 
lookup using a table with 2 m entries, each of which is a pair of 
elements in F2"». Therefore, in either case, decoding requires 
2m bits to be read from a lookup-table memory. 

Finally, for n = n\ = ri2, the data-flow contribution of the 
lookup-table-based decoding architecture is 4 ff . For n = 
1000, m = 10, v = 4, R = 239/255 and D = 100 Gb/s, the 
corresponding data-flow is 17.1 Gb/s, which is small relative 
to the data-flow that arises due to those effects considered in 
Section IIII-A2I 

References 

[1] W. D. Grover, "Forward error correction in dispersion-limited lightwave 
systems," J. Lightw. Technol, vol. 6, no. 5, pp. 643-645, May 1988. 

[2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: 
MIT Press, 1963. 

[3] T. Richardson and R. Urbanke, Modem Coding Theory. Cambridge, 

UK: Cambridge University Press, 2008. 
[4] I. B. Djordjevic, M. Arabaci, and L. L. Minkov, "Next generation 

FEC for high-capacity communication in optical transport networks," 

/. Lightw. Technol, vol. 27, no. 16, pp. 3518-3530, Aug. 2009. 
[5] T. Mizuochi, "Recent progress in forward error correction and its 

interplay with transmission impairments," IEEE J. Sel. Topics Quantum 

Electron., vol. 12, no. 4, pp. 544-554, Jul. 2006. 
[6] Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolic, "An 

efficient 100BASE-T ethernet LDPC decoder design with low error 

floors," IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 843-855, Apr. 

2010. 

[7] A. Darabiha, A. Chan Carusone, and F. R. Kschischang, "Power re- 
duction techniques for LDPC decoders," IEEE J. Solid-State Circuits, 
vol. 43, no. 8, pp. 1835-1845, Aug. 2008. 
[8] T. Richardson, "Error floors of LDPC codes," in Proc. 41st AUerton 

Conf. Comm., Control, and Comput., Monticello, IL, 2003. 
[9] T. Mizuochi et at , "Experimental demonstration of concatenated LDPC 
and RS codes by FPGAs emulation," IEEE Photon. Technol. Lett., 
vol. 21, no. 18, pp. 1302-1304, Sep. 2009. 

[10] A. J. Feltstrom, D. Truhachev, M. Lentmaier, and K. S. Zigangirov, 
"Braided block codes," IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 
2640-2658, Jun. 2009. 

[11] W. Zhang, M. Lentmaier, K. S. Zigangirov, and D. J. Costello Jr., 
"Braided convolutional codes: A new class of turbo-like codes," IEEE 
Trans. Inf. Theory, vol. 56, no. 1, pp. 316-331, Jan. 2010. 

[12] C. P. M. J. Baggen and L. M. G. M. Tolhuizen, "On diamond codes," 
IEEE Trans. Inf. Theory, vol. 43, no. 5, pp. 1400-1411, Sep. 1997. 

[13] T. Fuja, C. Heegard, and M. Blaum, "Cross parity check convolutional 
codes," IEEE Trans. Inf. Theory, vol. 35, no. 6, pp. 1265-1276, Nov. 
1989. 

[14] A. D. Wyner and R. B. Ash, "Analysis of recurrent codes," IEEE Trans. 

Inf. Theory, vol. 9, no. 3, pp. 143-156, 1963. 
[15] G. D. Forney Jr., "Burst-correcting codes for the classic bursty channel," 

IEEE Trans. Commun., vol. 19, no. 5, pp. 772-781, Oct. 1971. 
[16] I. S. Reed and X. Chen, Error-Control Coding for Data Networks. 

Boston, MA: Kluwer Academic Publishers, 1999. 
[17] R. T. Chien, B. D. Cunningham, and I. B. Oldham, "Hybrid methods 

for finding roots of a polynomial with application to BCH decoding," 

IEEE Trans. Inf. Theory, vol. 15, pp. 329-335, Mar. 1969. 
[18] E. R. Berlekamp, H. Rumsey, and G. Solomon, "On the solution of 

algebraic equations over finite fields," Inform. Contr, vol. 10, pp. 553- 

564, 1967. 



